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Preface 


The  purpose  of  this  scudy  was  to  Identify  criteria  with 
the  greatest  predictive  validity  for  the  successful  performance 
of  first-line  supeirvisors ,  which  could  be  used  by  middle  man¬ 
agers  to  improve  the  percentage  of  successful  first-time  super¬ 
visors  selected.  A  validity  generalization  model  developed  by 
Schmidt  and  Hunter  was  used  to  process  data  from  the  findings 
of  many  studies,  correcting  for  range  variation,  sample  size 
and  measurement  errors  to  ensure  conservative  estimates  of 
predictor  validity  coefficients. 
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support  in  the  face  of  some  unexpected  disappointments.  Another 
word  of  thanks  is  due  Mr.  Joel  Rice,  a  true  computer  expert,  who 
made  the  impossible  happen. 

Finally,  I  want  to  express  my  gratitude  and  appreciation 
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that  one  day  I  would  actually  finish  this  thesis. 
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Abstract 

This  study  identified  criteria  with  predictive  validity 
for  the  successful  performance  of  first-line  supervisors. 
Through  meta-analysis  it  is  possible  to  generalize  the  valid¬ 
ity  of  predictors  across  cumulative  studies.  A  validity  gener¬ 
alization  model y  which  corrects  for  artifactual  variance,  was 
used  to  process  data  from  the  findings  of  many  previous  studies 
based  upon  the  job  performance  measurement  criterion  of  first- 
line  supervisors. 

Analysis  revealed  two  predictors.  How  Supervise?  and  the 
General  Mental  Ability  Measures,  with  substantial  validity 
for  predicting  successful  performance  of  first-line  supervisors. 
The  Bennett  Mechanical  Comprehension  Test  and  the  Otis  Mental 
Ability  Test  also  showed  relatively  high  predictive  validity; 
however,  neither  form  of  the  Leadership  Opinion  Questionnaire 
evidenced  predictive  validity  for  successful  job  performance 
by  first-line  supervisors.  The  Wonderlic  Personnel  Test, 
although  not  highly  predictive,  may  be  useful  in  the  absence 
of  the  other  predictors. 


CRITERIA  FOR  SELECTION  OF  FIRST-LINE  SUPERVISORS 

I.  Introduction 

General  Issue 

"Supervision  is  the  function  of  leading,  coordinating, 
and  directing  the  work  of  others  to  accomplish  designated 
objectives"  (3:532).  "[it]  is  a  matter  of  getting  the  job 

done  through  others"  (5:xiii).  "Supervisor,"  a  generic  term 
"can  be  properly  used  to  apply  to  persons  in  charge  of  pro¬ 
duction  workers  as  well  as  those  in  charge  of  office 
workers"  (3:533). 

In  a  typical  organizational  structure,  three  divisions 
of  management  are  commonly  recognized — top,  middle,  and 
lower  (3:533).  "The  first  level  of  supervision  is  at  the 
bottom  of  the  management  level"  (5:3)  and  is  one  of  the  most 
critical  elements  in  organizational  effectiveness  (3:532). 
Middle-level  managers  have  the  responsibility  of  selecting 
first- level  supervisors  who,  generally,  have  no  previous 
supervisory  experience  (5:4-5). 

According  to  Bittel, 

The  performance  of  first-line  supervisors  ...  is 
judged  on  two  prime  accounts-- (1)  how  well  the  super¬ 
visor  has  managed  the  inputs  or  resources  and  (2)  how 


good  the  results  are  in  terms  of  the  volime,  quality, 
and  cost  of  the  products  and  services  produced.  (4:25) 

Too  often,  the  first-level  supervisor's  performance  is  less 


than  satisfactory  in  directing  his/her  group  of  employees 

toward  accomplishment  of  organizational  goals. 

In  one  research  investigation  of  the  attitudes  and 
motivation  of  200  accountants  and  engineers  employed 
in  nine  companies,  supervision  was  found  to  be  one  of 
the  chief  causes  for  dissatisfaction.  The  dissatis¬ 
faction  related  to  such  aspects  of  supervision  as 
incompetency,  poor  scheduling  of  work,  lack  of  teach¬ 
ing  ability,  unfriendly  relations,  lack  of  support, 
and  unwillingness  to  listen  to  suggestions.  (3:532) 

Furthermore,  when  management  "selects  a  person  who  becomes  a 

poor  supervisor  .  .  .  it  is  a  costly  error"  (5:42-43).  The 

supervisory  selection  decision  affects  many  workers,  and  a  poor 

choice  is  both  difficult  and  time-consuming  to  rectify  (5:14). 


Specific  Problem 

Much  has  been  written  identifying  and  discussing  the 

characteristics  of  a  successful  supervisor.  Northrup  and 

others  in  The  Objective  Selection  of  Supervisors .  quote 

Milton  M.  Mandell,  stating 

A  'good'  supervisor  is  one  who  'generates  job  satis¬ 
faction,  reduces  absenteeism  and  turnover,  improves 
the  quality  of  work  by  training  his  employees,  and 
serves  as  a  source  of  stability  during  emergencies'. 

(15:3) 

Broadwell  offers  a  list  of  characteristics  "common  to  all 
who  have  shown  a  high  degree  of  success  at  supervision"  (5:161). 


1.  WILLINGNESS  TO  WORK 

2.  WILLINGNESS  TO  TAKE  RISKS 

3.  ENTHUSIASM 

A.  EMPATHETIC 

5.  ABILITY  TO  MOTIVATE 

6.  ABILITY  TO  COMMUNICATE  (5:161-162) 

Weger,  in  Motivating  Supervisors .  cites  Edgar  Schein's 
"complex  model  of  man"  which  suggests : 

1.  The  supeirvisor  must  be  a  good  analyst  in  terns  of 
how  he  manages  the  people  under  him,  inasmuch  as 
their  motives  are  so  complex  and  so  variable. 

2.  The  supervisor  must  have  fine  sensitivity  to  people 
to  be  able  to  appreciate  the  subtle  differences  in 
what  motivates  people,  particularly  the  individuals 
in  his  work  group. 

3.  The  supervisor  must  have  personal  flexibility  and  the 
ability  to  meet  the  challenging  needs  and  motives  of 
his  subordinates.  (26:59) 

The  problem,  therefore,  is  not  in  describing  the  desired 
characteristics  of  successful  supervisors,  nor  even  in  the 
evaluation  of  supervisors*  performance  to  determine  their 
degree  of  success  in  organizational  effectiveness.  The  prob¬ 
lem  is  to  develop  criteria  which  will  identify  those  employees 
possessing  a  high  degree  of  potential  for  successful  perfor¬ 
mance  as  first-line  supervisors.  The  use,  then,  of  such  criteria 
in  personnel  selection  procedures  would  increase  the  percentage 
of  successful  first-time,  first-line  supervisors. 


Background 


Historically,  personnel  placement  was  less  than  optimal. 

The  job  of  supervising  work  and  people  has  always 
been  thought  of  in  the  past  as  a  rather  simple,  direct 


operation.  A  workman  elevated  to  the  position  of 
gang  or  office  boss  had  only  to  exercise  the  authority 
vested  in  his  new  job  to  get  people  to  work  .... 

The  blame  for  failure  to  follow  orders  or  poor  work¬ 
manship  fell  on  the  worker,  not  on  the  foreman.  (13:3) 

When  the  pressure  of  war  production  .  .  .  absorbed 
the  labor  surplus  .  .  .  [and]  the  supply  of  skilled 
mechanics,  office  personnel,  and  general  workers  [was] 
suddenly  cut  off  .  .  .  foremen  and  supervisors  were 
told  to  make  every  effort  to  keep  workers  on  the  job, 
to  improve  the  skills  of  these  people,  and  to  change 
their  methods  of  dealing  with  subordinates.  Most 
supervisors  were  incapable  of  making  the  required 
changes  in  methods,  and  as  a  result  a  chaotic  period 
of  readjustment  followed.  (13:4) 

"The  old-time,  two-fisted,  hard-boiled  boss  .  .  .  [was] 
replaced  by  the  modern  leader"  (13:7).  Rapidly  expanded 
educational  opportunities  resulted  in  "professionalization  of 
the  supervisor’s  job"  (13:8).  However,  "as  the  job  became 
more  professionalized,  the  problem  of  selecting  suitable  can¬ 
didates  to  fill  these  jobs  became  more  difficult"  (13:8-9). 

Thus,  aptitude  and  achievement  tests  used  in  personnel  selec¬ 
tion  processes  developed  extensively  between  1920  and  1940  (24:1). 

"Personnel  testing  has  one  specific  objective:  to  contri¬ 
bute  to  the  increasingly  effective  use  of  manpower  within 
an  organization"  (10:3).  During  the  past  thirty  years  an 
abundance  of  material  has  been  published  detailing  psycholog¬ 
ical  tests  and  assessment  methods  with  predictive  validity  in 
identifying  personnel  with  high  supervisory  or  managerial 
potential.  So  many  tests  are  available  that  several  comprehensive 
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references  have  been  compiled  providing  descriptive  sianmaries 
of  assessment  tests  in  psychology,  education,  and  business  (23) 

Perhaps  the  oldest  and  most  widely  recognized  measure¬ 
ment  of  supervisory  potential  is  How  Supervise?,  developed  in 
the  1940s  by  Q,  W.  File  and  H.  H,  Remmers.  How  Supervise?  was 
"designed  to  measure  'knowledge  and  insight  concerning  human 
relations  in  industry'"  (10:470).  The  targeted  population  for 
How  Supervise?  is  supervisory  personnel:  two  forms  are  intended 
for  first-line  supervisors,  and  a  third  form  is  designed  for 
higher  level  supervisors  (10:470).  Subsequent  studies,  however 
indicate  a  lack  of  data  to  support  the  predictive  validity  of 
How  Supervise?  scores  in  successful  supervisory  performance 
(7:406).  Results  of  these  studies  suggest  How  Supervise?  may, 
in  fact,  "measure  'supervisory  knowledge'"  or  "verbal  intel¬ 
ligence"  (14:405). 

Another  early-developed  measurement  instrument.  The 
Leadership  Opinion  Questionnaire,  evidences  an  expected  cor¬ 
relation  between  "Consideration"  scores  and  future  supervisory 
success  in  companies  in  which  progressive  personnel  practices 
are  supported.  Actual  predictive  validity  was  biased  by  the 
extent  to  which  supervisors  conformed  (or  failed  to  conform) 
to  the  corporate  attitude  (2:345). 

Spitzer  and  McNamara  conducted  a  study  in  the  early  1960s 
"to  determine  if  tests  could  be  selected  which  would  be 


positively  related  to  managerial  success"  and,  therefore, 

have  predictive  value  In  "selecting  employees  who  have  good 

managerial  potential"  (22:19).  Findings  showed  the  use  of 

the  Otis  Test  of  Mental  Ability,  and  the  Background  and 

Contemporary  Data  Form  (a  behavioral  personality  index) 

in  the  selection  procedure  would  increase  the  number 
of  managers  capable  of  performing  in  an  above-average 
manner  .  .  .  (^by]  as  much  as  36  per  cent  over  the 
present  situation  as  measured  by  the  criterion  of  the 
study.  (22:24,19) 

More  recent  studies  by  Rowland  and  Scott  reveal  signif¬ 
icantly  positive  relationships  between  work  group  performance 
(perceived  to  be  a  factor  of  successful  supervisory  perfor¬ 
mance)  and  the  supervisor's  intelligence,  need  for  aggression, 
and  influence  with  superiors  (measured  by  Edwards  Personal 
Preference  Schedule)  (17:375). 

The  current  trend  is  toward  the  use  of  assessment  centers 
to  determine  supervisory/managerial  potential  and  predict 
candidates'  successful  performance  (25:595). 

The  combination  of  management's  desire  to  select 
managers  of  high  quality  and  the  federal  government's 
policy  of  demanding  objective  proof  of  nondis criminatory 
selection  has  generated  substantial  interest  in  the 
assessment  center  method. 

The  assessment  center  method  is  holistic;  it  engages 
almost  totally  the  personalities  of  candidates  for 
upgrading  and  provides  an  analytical  and  evaluative 
structure  whereby  a  number  of  assessors  can  observe  the 
candidates  and  can  combine  several  predictors  to  form  an 
overall  rating  of  potential  for  each  candidate.  (15:91) 
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Considerable  disparity  exists,  however,  regarding  the 
predictive  validity  of  assessment  center  criteria.  Dunnette 
and  Borman,  in  the  Annual  Review  of  Psychology.  1979,  found 
"reports  have  been  favorable  to  the  assessment  center  method" 
(8:510).  Five  years  later,  however,  Tumage  and  Muchinsky 
identified  a  number  of  problems  associated  with  the  assess* 
ment  center  evaluations.  Two  fundamental  problems  are: 

(1)  absence  of  reliable  and  objective  supervisory  performance 
measures,  and  (2)  lack  of  comparative  data  for  "alternative 
predictors"  (25:595-596).  In  summary,  Tumage  and  Muchinsky 
fovind 

that  (a)  both  assessment  center  evaluations  and  tra¬ 
ditional  predictors  were  generally  unrelated  to  job 
performance  but  that  (b)  assessment  center  evaluations 
were  predictive  of  promotability.  (25:595) 

The  California  Psychological  Inventory  (CPI)  is  designed 

to  measure 

poise,  ascendancy,  ,  .  .  self-assurance,  .  .  .  social¬ 
ization,  maturity,  .  .  .  responsibility  .  .  .  achieve¬ 
ment  potential  and  intellectual  efficiency  .  .  .  and 
intellectual  and  interest  modes.  (10:316) 

Findings  reveal  that  use  of  only  six  of  the  18  scales  (domi¬ 
nance,  ‘good  impression,  communality,  flexibility,  femininity, 
and  intellectual  efficiency)  give  "a  less  ambiguous  interpre¬ 
tation  of  the  total  profile"  (10:315-317).  Harrison  G.  Gough, 
Institute  of  Personality  Assessment  and  Research,  IMiversity 
of  California,  Berkeley,  conducted  a  new  analysis  of  the 
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findings  of  the  CPI  "good  manager"  scale  developed  in  1963 
by  Goods tein  and  Schrader.  Both  the  original  (1963)  and 
revised  (1983)  versions  of  the  CPI  signify  predictive  validity 
for  managerial  success.  Gough's  conclusion  recommended  addi¬ 
tional  new  research  addressing  "explicit  measures  of  manage¬ 
rial  performance  for  both  sexes,  .  .  .  interactional  issues, 
and  .  .  .  longitudinal  criteria"  (9:233). 

Leaetta  M.  Hough,  Personnel  Decision  Research  Institute, 
Minneapolis,  Minnesota,  recently  developed  the  "Accomplishment 
Record"  (AR)  method  of  identifying  professionals  for  selection 
and  promotion.  The  AR  Inventory  is  based  upon  the  hypothesis 
that  "'the  best  indicator  of  future  performance  is  past  perfor¬ 
mance'"  (11:135).  Initial  results  suggest  the  AR  Inventory 
"is  equally  predictive  and  fair"  for  men,  women,  minorities, 
and  nonminorities.  Further,  "the  AR  Inventory  appears  to 
measure  aspects  of  the  individual  not  tapped  by  more  tradi¬ 
tional  methods  of  testing,"  and  connote  significant  predictive 
validity  in  job  performance  measures  (11:142,143,146). 

Scope  of  Research 

The  literature  includes  studies  relative  to  the  selection 
of  supervisory  and/or  managerial  personnel.  Northrup  and 


Although  few  jobs  have  received  as  much  attention  as 
has  the  job  of  first -line  supervisor,  a  search  dis¬ 
closes  that,  at  least  until  recently,  the  literature 
on  supervisor  selection  has  been  relatively  sparse 
....  Most  writings  .  .  .  deal  with  three  areas : 
the  role  of  supervisors,  their  qualifications,  or 
their  training.  Some  of  the  literature  .  .  .  concerns 
itself  with  supervisor  selection,  but  even  that  deals 
largely  with  different  techniques  .  .  .  for  improving 
the  selection  process.  (15:11) 

An  aggregation  of  data  resulting  from  findings  of  previous 
studies  is  available  for  input  into  a  validity  generalization 
model.  The  data  are  the  correlation  coefficients  derived  from 
the  scoring  of  selected  predictors  administered  to  first-line 
supervisors  rated  by  various  degrees  of  successfulness.  Valid¬ 
ity  generalization  models  have  proven  effective  in  identifying 
criteria  with  predictive  validity  for  selecting  first-line 
supervisors  with  high  potential  for  successful  performance. 

Research  Objectives 

What  predictors  can  be  administered  to  first-time  super¬ 
visory  candidates  which  will  provide  a  significant  correlation 
with  successful  job  performance?  What  criteria  exist  for  pre¬ 
dicting  the  probability  of  successful  performance  by  first¬ 
time,  first-line  supervisors?  How  can  such  data  be  used  in 
developing  criteria  for  the  personnel  selection  process  to 
assist  managers  in  selecting  first-time,  first-line  supervisors 
who  will  be  successful  in  their  positions? 


II,  Methodolo: 


Method  of  Approach 

The  method  of  approach  to  be  used  for  this  study  is 
meta-analysis,  utilizing  the  Schmidt -Hunter  Validity  General¬ 
ization  Model.  "Validity  generalization  is  the  degree  to  which 
inferences  from  scores  on  tests  can  be  transported  across  dif¬ 
ferent  situations"  (6:94).  Meta-analysis  is  the  term  used  to 
describe  "the  statistical  analysis  of  the  findings  of  many 
individual  analyses"  (21:352).  It  is  a  method  of  integrating 
the  findings  of  research  literature  from  numerous  studies  in 
order  to  make  sense  of  them  (21:352-355). 

An  existing  data  base  developed  by  Shane  (20)  is  used  to 
provide  the  statistical  inputs  for  the  modified  Schmidt -Hunter 
model.  The  existing  data  included  findings  from  studies  pub¬ 
lished  through  May  1977.  A  literature  search  failed  to  reveal 
any  more  recent  statistical  data  which  cited  "job  performance" 
as  the  measurement  criterion.  Therefore,  the  previously  com¬ 
piled  data  constitute  the  whole  of  the  statistical  inputs  used 
in  this  study. 

Statistical  data  were  gleaned  from  studies  which  reported 
the  correlation  between  selected  test  scores  and  successful 


performance  as  first-line  supervisors  using  job  performance 
ratings  as  the  measurement  criterion.  The  selected  tests 
include:  a  general  mental  ability  measure,  the  Wonderlic 
Personnel  Test,  Otis  Mental  Ability  Test,  Bennett  Mechanical 
Comprehension  Test,  How  Supervise?,  and  two  forms  of  the 
Leadership  Opinion  Questionnaire.  The  sample  size  and  corre¬ 
lation  coefficients  from  each  study  are  input  into  the  Schmidt 
Hunter  Validity  Generalization  Model,  and  an  analysis  of  the 
output  data  identifies  those  tests  having  predictive  validity 
for  the  successful  performance  of  first-line  supervisors. 

These  findings  may  then  be  useful  in  developing  selection 
criteria  which  will  increase  the  probability  of  identifying 
and  selecting  candidates  for  promotion  who  will  perform  well 
as  first-time,  first-line  supervisors. 

Methodology  Literature  Review 

The  term  '*meta-analysis"  was  coined  by  Gene  V.  Glass  in 
1976  to  describe  the  process  of  integrating  research  findings 
from  many  studies  to  ensure  more  meaningful  and  comprehensible 
results  (21:352-354).  Hunter,  Schmidt,  and  Jackson  reviewed 
"all  the  methods  that  have  been  proposed  for  cumulating  know¬ 


ledge  across  studies,"  referring  "to  the  averaging  methods  as 
'meta-analysis'"  (12:11).  According  to  Schmidt  and  others. 


Cumulaclon  of  results  can  be  used  whenever  there  are 
at  least  two  studies  with  data  bearing  on  the  same 
relation.  (12:28) 

Ideally,  cumulation  of  results  works  best  if  it  is 
based  on  a  large  nimiber  of  studies  acquired  by  exhaus¬ 
tive  search  procedures.  However,  cumulation  is  also 
valid  for  'convenience'  samples  of  studies  that  just 
happen  to  lie  at  hand.  This  is  particularly  true  if 
the  corrected  standard  deviation  suggests  that  all  the 
variation  across  studies  is  due  to  sampling  error  .  .  .  . 
then  the  accuracy  of  the  mean  value  in  relation  to  the 
one  true  population  value  is  determined  by  the  total 
number  of  subjects  across  studies.  Even  a  relatively 
small  number  of  studies  may  have  a  large  cumulative 
sample  size.  (12:29) 

In  meta-analysis  correlations  are  corrected  through 
mathematical  formulas  for  sampling  error,  measurement  error, 
and  range  variation  across  the  studies.  The  correction  for¬ 
mulas  are  resident  in  the  Schmidt -Hunter  Validity  Generaliza¬ 
tion  Model. 

Currently,  six  different  procedures  have  been  defined 
and  tested  for  estimating  the  mean  and  variance  of 
true  validity  coefficients.  These  procedures  or  equa¬ 
tions  are  the  noninteractive  procedure  (Pearlman  et  al., 
1980),  the  interactive  procedure  (Schmidt,  Cast -Rosenberg 
and  Hunter,  1980),  the  independent  and  dependent  multi¬ 
plicative  equations  (Callender  and  Osborn,  1980),  and  the 
Taylor -Series  Approximations  1  and  2  (Raju  and  Burke, 
1983).  The  noninteractive  procedure  is  an  improved 
version  of  the  Schmidt-Hianter  (1977)  procedure.  (6:97) 

Although  several  different  formulas  have  been  developed, 

"studies  have  found  only  trivial  differences  that  would  be 

expected  from  the  minute  size  of  the  product  terms  involved" 

(12:92). 
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The  derivation  of  different  formulas  has,  however, 
resulted  in  the  promulgation  of  a  number  of  critiques  of 
the  Schmidt -Hunter  model.  Lauding  Schmidt  and  Hunter  as 


pioneers  in  introducing  the  basic  notions  of  Bayesian 
statistics  into  the  field  of  validity  research,  so 
that  results  of  preceding  studies  can  be  tised  to 
develop  cumulative  knowledge,  (1:208) 

Algera  and  others,  1984,  nonetheless 

criticized  .  .  .  the  compilation  of  validity  data,  the 
use  of  criterion  measures,  and  the  test  of  the  hypoth¬ 
esis  of  no  situational  specificity,  [concluding]  the 
Schmidt -Hunter  approach  to  validity  generalization 
shows  fundamental  shortcomings.  (1:197) 

Results  of  recently  conducted  computer  simulation  studies 
to  assess  the  accuracy  of  the  different  validity  generalization 
procedures 

tend  to  support  the  accuracy  of  the  various  procedures 
for  estimating  the  mean  and  variance  of  true  validity 
coefficients  ....  however,  [they]  do  suggest  that 
there  is  some  room  for  improvement  in  procedures  to 
determine  validity  generalization  based  on  the  corre¬ 
lation  model.  (6:106) 


Justification  of  the  Approach 

Prior  to  1976  personnel  psychologists  generally  believed 
"that  meaningful  empirical  validation  studies  are  possible  for 
most,  if  not  all,  jobs  in  most  organizations"  (18:529).  Schmidt, 
Hunter,  and  Urry,  as  a  result  of  a  study  in  1976,  concluded 
"that  empirical  validity  studies  are  'technically  feasible' 
much  less  frequently  than  the  profession  [had]  assumed"  (18:529). 
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A  subsequent  study  by  Schmidt  end  Hmter  addressing  "the 
belief  that  test  validity  is  generally  highly  situation 
specific"  (18:529)  resulted  in  the  development  of  a  validity 
generalization  model  which^  in  many  instances,  permits 
"validity  generalization  to  new  settings  without  carrying  out 
a  validation  study  of  any  kind,"  or,  in  any  case,  "provides 
an  improved  method  of  data  analysis  and  decision  making  for 
the  necessary  situational  validity  study"  (18:529).  Schmidt 
and  Hunter  fotond 

evidence  suggesting  that  much  of  the  variance  in  the 
outcomes  of  validity  studies  within  job-test  combi¬ 
nations  may  be  due  to  statistical  artifacts.  (18:529) 

tn  a  well-executed  large  sample  series  of  studies 
conducted  by  BrogdenJ ,  it  was  fomd  that  when  Army 
occupations  were  classified  rationally  into  job  fam¬ 
ilies,  tests  showed  essentially  identical  validities 
and  regression  weights  for  all  jobs  within  a  given 
family.  (18:530) 

Thus, 

Brogden  .  .  .  concluded  that  when  methodological  arti¬ 
facts  are  controlled  and  large  samples  are  used  .  .  . 
obtained  validities  are  in  fact  quite  stable  and  sim¬ 
ilar  across  time  and  situations  for  similar  jobs.  (18:530) 

The  Schmidt -Hunter  model,  based  upon  the  principles  of 

Bayesian  statistics 

directly  relates  methods  of  data  analysis  used  in 
making  inferences  about  validity  in  criterion- related 
validity  studies  to  the  concept  of  validity  generali¬ 
zation.  The  general izability  of  validity  is  seen  to 
be  a  matter  of  degree  and  is  quantified  in  the  proper¬ 
ties  of  the  prior  distribution  .  .  .  [providingj  a 
direct  answer  to  the  question  of  whether  validity 


generalization  is  justified  or  not  without  a  situation- 
specific  empirical  validation  study.  (18:531) 

Some  important  features  of  the  Schmidt -Hunter  model  which 

enhance  its  "credibility  and  acceptance"  are: 

First,  the  Bayesian  priors  to  be  used  .  .  .  are  empir¬ 
ically  determined  based  on  data  from  past  studies. 

Second,  the  assumptions  made  about  be tween-study  var¬ 
iance  in  criterion  reliability  and  range  restriction 
are  conservative. 

Third,  certain  sources  of  error  variance  in  the  obtained 
distribution  [computational  and  typographical  errors] 
are  not  corrected  for,  further  ensuring  conservatism. 

Fourth,  corrections  made  to  the  mean  of  the  prior  for 
average  range  restriction  effects  .  .  .  probably  [tend] 
to  imderestimate  the  true  mean  of  the  corrected  prior. 

Fifth,  .  .  .  this  procedure  provides  a  parsimonious, 
sophisticated,  and  technically  sound  solution  for  the 
overarching  problem  of  validity  generalization  .  .  . 
[which]  may  lead  to  large  dollar  savings  by  eliminating 
the  need  for  many  criterion-related  validity  studies. 

Sixth,  the  model  can  be  extended  to  provide  an  improved 
method  of  data  analysis  and  decision  making  in  criterion- 
related  validity  studies. 

Finally,  the  model  .  .  .  provides  a  tool  that  may  lead 
to  the  establishment  of  general  principles  about  trait- 
performance  relationships  in  the  world  of  work  .... 
(18:538) 

In  a  more  recent  article  by  Schmidt  and  Hunter  (1980), 
they  demonstrate  that 

validity  estimation  and  generalization  by  both  empirical 
and  rational  methods  will  be  widely  possible  without 
situation-specific  empirical  studies. 


Estimates  of  the  dollar  iii^>act  of  selection  Instruments 
on  productivity  will  be  much  more  frequently  computed, 
and  many  who  are  currently  skeptical  will  become  con¬ 
vinced  of  the  critical  importance  of  selection  to  organ 
izational  success.  (19:42) 


Decision  Rules 

According  to  Schmidt  and  Hunter,  "a  valid  test  will  show 

a  statistically  significant  validity  in  only  about  50%  of 

studies"  (19:44).  And,  as  previously  stated,  evidence  suggests 

"that  much  of  the  variance  in  the  outcomes  of  validity  studies 

.  .  .  may  be  due  to  statistical  artifacts"  (19:44).  Further, 

if  the  variance  in  validity  coefficients  across  situ¬ 
ations  for  job- test  combinations  is  due  to  statistical 
artifacts,  then  .  .  .  the  doctrine  of  situational  speci¬ 
ficity  is  false  and  validities  are  generalizable.  (19:44) 

Schmidt,  Hunter  and  others  "developed  a  method  for  testing  this 

hypothesis"  (19:44). 

[starting  3  with  a  fairly  large  number  of  validity 
coefficients  for  a  given  test- job  combination,  .  .  . 
[compute]  the  variance  of  [the^  distribution.  From 
this  variance,  .  .  .  [subtract’  variance  due  to  var¬ 
ious  sources  of  error.  (19:44-45) 

The  seven  commonly  acknowledged  sources  of  error  variance 
include : 

1.  Differences  between  studies  in  criterion  reliability. 

2.  Differences  between  studies  in  test  reliability. 

3.  Differences  between  studies  in  range  restriction. 

4.  Sampling  error  (i.e.,  variance  due  to  N<oo). 

5.  Differences  between  studies  in  amount  and  kind  of 
criterion  contamination  and  deficiency  (Brogden 
and  Taylor,  1950). 


I 

6.  Computational  and  typographical  errors  (Wolins,  1962). 

7.  Slight  differences  in  factor  structure  between  tests 
of  a  given  type  (e.g.,  arithmetic  reasoning  tests). 

I  (19:45) 

Separate  studies  have  shown  that  artifactual  variance  may 
account  for  as  much  as  100%  of  the  observed  variance  for  cer¬ 
tain  distributions.  "Thus  there  is  now  strong  evidence  that 
the  observed  variation  in  validities  from  study  to  study  for 
similar  test-job  combinations  is  artifactual  in  nature"  (19:45-46). 

Therefore,  after  making  the  appropriate  corrections  to 
the  mean  and  standard  deviation  of  a  validity  distribution 

one  may  find  that  a  large  percentage,  say  907o,  of 
all  values  in  the  distribution  lie  above  the  minimum 
useful  level  of  validity.  In  such  a  case,  one  can 
conclude  with  907o  confidence  that  the  estimate  of 
true  validity  would  be  at  or  above  this  minimum  level 
....  (19:46) 

Note,  however,  that  Schmidt  and  Hunter  continue  by  stating 
that 

the  best  estimate  of  test  validity  is  the  mean  of 
the  corrected  validity  distribution,  not  the  value 
at  the  foot  of  the  90  percent  confidence  interval. 

(19:46) 

Further,  due  to 

a  number  of  [recent^  significant  developments  in 
personnel  psychology  resulting  .  .  .  from  rejection 
of  the  erroneous  belief  in  the  law  of  small  numbers 
...  it  now  appears  likely  that  rational  estimates 
[based  on  dollar  utility  gain  formulas]  of  test 
validity  may  be  found  sufficiently  accurate  for  oper¬ 
ational  use.  (19:54) 
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Summary 

Two -hundred- five  data  points  will  be  input  into  the 
Schmidt -Hunter  Validity  Generalization  Model.  These  data 
include  the  sample  size  and  correlation  coefficients  result¬ 
ing  from  job  performance  criterion  studies  of  seven  predictors 
The  cumulative  sample  sizes  range  from  951  for  the  Wonderlic 
Personnel  Test  to  5533  for  the  General  Mental  Ability  Measures 
Analysis  of  the  output  data  will  identify  the  predictors  with 
the  greatest  predictive  validity  for  the  successful  perfor¬ 


mance  of  first-line  supervisors. 


Seven  predictors  based  upon  job  performance  criterion 
are  analyzed  using  the  Schmidt-Hunter  Validity  Generalization 
Model.  To  facilitate  analysis  the  predictors  were  coded  as 
follows : 

01  -  Bennett  Mechanical  Comprehension  Test 
02  -  General  Mental  Ability  Measures 
03  -  Wonderlic  Personnel  Test 
04  -  Otis  Mental  Ability  Test 
05  -  How  Supervise? 

06  -  Leadership  Opinion  Questionnaire  (Initiating  Structure) 
07  -  Leadership  Opinion  Questionnaire  (Consideration) 

The  measurement  criterion,  job  performance,  was  the  same 
for  all  predictors  and  was  based,  generally,  upon  the  ratings 
of  second-level  supervisors.  Since  both  manufacturing  and 
service  organizations  were  included  in  the  studies,  sample 
job  titles  were  classified  either  as  "foremen"  or  "supervisors." 

The  remainder  of  this  section  provides  a  brief  descrip¬ 
tion  of  the  output  data  which  was  generated  for  each  of  the 
predictors. 


Meta-analysis,  as  discussed  In  the  preceding  chapter  of 
this  study.  Is  the  cumulation  of  results  from  numerous  studies. 
The  validity  generalization  model,  therefore,  cumulates  data 
from  a  large  number  of  studies  and  identifies  the  cumulative 
sample  size  for  each  predictor  as  the  "total  N. "  The  total 
nxmiber  of  data  points  entered  Into  the  model  (the  correlation 
coefficients  from  the  data  base)  are  designated  "No.  Rs"  for 
each  predictor.  The  actual  observed  standard  deviation  and 
the  predicted  standard  deviation  are  generated  for  each 
predictor.  It  Is  noted  that  for  all  seven  predictors  used  In 
this  study,  the  predicted  standard  deviation  Is  lower  than  the 
observed  standard  deviation.  Pearlman  and  others  addressed 
this  situation,  stating 

Within  a  given  set  of  validity  distributions  repre¬ 
senting  a  variety  of  job  family- test  type  combina¬ 
tions,  there  are  likely  to  be  some  distributions  In 
which  the  three  unassessed  sources  of  variance  are 
present  to  varying  degrees  ....  In  distributions 
of  fthis]  type,  we  would  expect  the  predicted  standard 
devl^atlon  to  fall  below  the  observed  standard  devia¬ 
tion  to  varying  degrees.  (16:384) 

The  amount  of  variance  for  which  correction  for  sampling 
error,  measurement  error  (criterion  and  test  reliability), 
and  range  variation  across  the  studies  has  been  made  through 
the  mathematical  formulas  Inherent  In  the  model  Is  output  as 
"%  var  acct  for."  "Residual  SD",  the  residual  standard  devi¬ 
ation,  is  the  square  root  of  the  variance  remaining  after 
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mathematical  correction  for  the  statistical  artifacts 


(16:383-384).  Both  "mean  R, "  "an  estimate  of  the  fully  cor¬ 
rected  mean  validity"  (16:402)  coefficient  corrected  for  test 
and  criterion  unreliability  (measurement  error)  and  range 
restriction,  and  "mean  R  unres,"  "an  estimate  of  the  unre¬ 
stricted  mean  validity"  (16:402),  the  mean  observed  validity 
coefficient  corrected  for  range  restriction  but  not  for  atten¬ 
uation  due  to  measurement  error  (16:402)  are  included  in  the 
output  data.  "Tirue  R"  denotes  the  mean  of  the  true  validity 
distribution,  the  correlation  coefficient  of  the  predictor, 
corrected  for  the  three  types  of  artifacts  (16:404-405).  The 
"True  R"  and  "True  R  Attenuated"  are  identical  figures  for  all 
predictors  used  in  this  study  except  02,  General  Mental  Ability 
Measures.  This  is  explained  by  the  fact  that  the  validity 
coefficient  for  the  distribution  of  those  predictors  with  a 
single  predictor  criterion  is  1.0,  while  Predictor  02,  consist¬ 
ing  of  several  different  tests,  necessitated  use  of  the  model's 
assumed  distribution  for  the  predictor  validity. 

"The  validity  value  at  or  above  which  90%  of  all  esti¬ 
mates  of  true  validities  lie"  (16:387)  is  called  the  credibil¬ 
ity  value.  "95  cred  value"  and  "90  cred  value"  signify  the 
lower  boimds  of  the  confidence  intervals  at  957a  and  90% 
respectively.  Finally,  the  amount  of  variance  corrected  for 
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due  to  san^llng  error  only,  and  the  amount  of  variance  cor> 
rected  for  due  to  the  other  artifacts  are  designated  as 
"Sample  Size  Var"  and  "Other  Artif  Var"  in  the  output  data. 

Analysis 

A  data  table  of  meaningful  output  has  been  constructed 
for  ease  of  comparison  of  the  significant  statistics  (see 
Table  I).  According  to  Schmidt  and  Hunter,  "the  best  estimate 
of  test  validity  is  the  mean  of  the  corrected  validity  distri¬ 
bution"  (18:46).  Thus,  the  True  R  generated  by  the  model  is, 
perhaps,  the  most  meaningful  statistic  in  determining  the 
predictive  validity  of  the  particular  predictor.  Con^aring 
True  Rs  for  the  seven  predictors,  it  is  readily  discernible 
from  the  Table  that  Predictors  05,  How  Supervise?,  and  02, 
the  General  Mental  Ability  Measures,  have  the  highest  degree 
of  predictive  validity,  with  correlation  coefficients  of  .692 
and  .655  respectively.  Althotigh  the  percent  of  variance 
accounted  for  in  Predictor  05  is  somewhat  low  (only  25.4%), 
this  would  suggest  that  some  degree  of  situational  speci¬ 
ficity  may  be  present,  but  does  not  alter  the  high  degree  of 
predictive  validity  evidenced  by  the  correlation  coefficient. 

Predictors  01  and  04,  the  Bennett  Mechanical  Comprehen¬ 


sion  Test  and  the  Otis  Mental  Ability  Test,  also  demonstrate 
a  high  degree  of  predictive  validity,  with  correlation 


coefficients  of  .582  and  .574  respectively.  These  predictors 
also  exhibit  high  percentages  of  variance  accounted  for — 46.4% 
for  the  Bennett  Mechanical  Comprehension  Test  and  83.1%  for 
the  Otis  Mental  Ability  Test. 

Further,  it  is  easily  recognized  that  Predictors  06  and 
07,  both  forms  of  the  Leadership  Opinion  Questionnaire,  show 
no  significant  predictive  validity  for  the  successful  perfor¬ 
mance  of  first-line  supervisors.  Predictors  with  True  Rs 
near  0,  and  credibility  values  which  are  negative,  evidence 
no  predictive  validity  which  can  be  generalized  across  studies 

The  Wbnderlic  Personnel  Test,  Predictor  03,  although  not 
as  highly  predictive  as  01,  02,  04,  and  05,  could  prove  use¬ 
ful  in  the  absence  of  availability  of  any  other  predictive 
measures.  It  is  also  noteworthy  that  the  percentage  of 
variance  accounted  for  in  the  Wonderlic  is  quite  high  at  67.6% 
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TABLE  I 


Comparison  of  Meaningful  Statistics  for  Predictors 


Pred  No. 

Total 

N 

Total 

Rs 

True 

R 

%  Var 
Acct 
For 

Res 

SD 

90% 

C.V.* 

95% 
c.v.  * 

01 

1601 

22 

.582 

46.4 

.135 

.376 

.336 

02 

5533 

80 

.655 

45.2 

.142 

.442 

.402 

03 

951 

18 

.512 

67.6 

.098 

.280 

.236 

04 

1869 

29 

.574 

83.1 

.059 

.357 

.316 

05 

1441 

15 

.692 

25.4 

.210 

.490 

.451 

06 

1228 

19 

.060 

85.4 

.051 

-.143 

-.182 

07 

1477 

22 

.134 

28.2 

.188 

-.060 

-.097 

*c.v.  ■  credibility  value 
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IV.  Conclusion 


Based  upon  the  research  conducted  in  this  study  it  is 
concluded  that  at  least  two  predictors  can  be  administered 
to  first-time  supervisory  candidates  which  provide  signifi¬ 
cant  correlation  with  successful  job  performance.  Those 
predictors  are  the  General  Mental  Ability  Measures  (including 
such  tests  as  the  Otis,  Wonderlic,  Scholastic  Aptitude  Test 
[sat].  Army  General  Classification  Test,  Purdue  Adaptability 
Test,  Thurstone  Mental  Abilities  Test,  and  the  General  Apti¬ 
tude  Test  Battery),  and  the  How  Supervise?  questionnaire. 

Additionally,  two  other  predictors,  the  Bennett  Mechan¬ 
ical  CoQ^ rehens ion  Test  and  the  Otis  Mental  Ability  Test, 
have  a  high  degree  of  predictive  validity  for  successful 
first-line  supervisory  performance.  Another  predictor,  the 
Wonderlic  Personnel  Test,  although  not  as  highly  predictive 
as  those  cited  above,  may  prove  valuable  if  none  of  the  other 
predictors  are  available  to  the  personnel  selectors. 

Findings  from  numerous  studies  provide  a  useful  data 
base  for  the  Schmidt-Hunter  Validity  Generalization  Model. 
Analysis  of  the  output  from  this  model  identifies  existing 
criteria  for  predicting  the  probability  of  successful 


performance  by  first-time,  first-line  supervisors.  The  use 
of  predictors  found  to  have  significant  correlation  with 
successful  job  performance  by  managers  when  selecting  first¬ 
time,  first-line  supervisors  is  expected  to  substantially 
increase  the  probability  that  the  selected  candidate  will 
perform  at  a  higher- than-average  level  in  his/her  new  position. 
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